-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Panama float vector distance impls inlinable #14031
Conversation
… method to be inlined (325)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool; LGTM
good here too. we can also save another 5 bytes with something like this. it seems to help me a tiny bit according to the JMH too. not sure if it makes the code harder or easier to read/maintain. i sorta like today that it is clear at a glance there are no data dependencies. We could also move the i2/i3/4 to top of loop to accomplish that if we wanted.
|
We can iterate on last patch and save a few more bytes (302b) if we just pull out into a static final constant instead, too:
I feel like it makes the code a bit easier on the eyes, and benchie is happy:
|
@rmuir nice!!! wanna push that to the branch? Then I’ll do some more benchmark runs tomorrow too. |
I applied and tested the same approach with the other 2 functions too. cosine was already underweight: it is only unrolled twice due to complexity of the mathematical formula, but it keeps the floats consistent. we could tidy up the binary ones in similar fashion as a followup for more consistency, but since jvm can already unroll the integer math, they arent unrolled and i expect they are already under limit. microbenchmarks seem happy but I assume the real gains are from more macrobenchmark where the inlining can help.
edit: re-ran |
++
Right. The microbenchmarks show some modest improvement, but it seemed reasonably straightforward to eliminate not being inlined from the equation when trawling over local luceneutil runs. I don't have specific numbers yet, but let's merge this change as an incremental improvement and keep an eye on Mike's nightly benchmark runs. :-) |
FTR Apple M2 Pro
Intel SkyLake
|
This commit reduces the Panama vector distance float implementations to less than the maximum bytecode size of a hot method to be inlined (325). E.g. Previously: org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport::dotProductBody (355 bytes) failed to inline: callee is too large. After: org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport::dotProductBody (3xx bytes) inline (hot) This helps things a little. Co-authored-by: Robert Muir <rmuir@apache.org>
FYI nightly benchmarks had a big regression last night, and this is the only change I can find that could have caused this: https://benchmarks.mikemccandless.com/VectorSearch.html. |
@jpountz lets just revert it and figure it out separately? |
This commit reduces the Panama vector distance float implementations to less than the maximum bytecode size of a hot method to be inlined (325).
E.g. Previously:
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport::dotProductBody (355 bytes) failed to inline: callee is too large
.After:
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport::dotProductBody (311 bytes) inline (hot)
This helps things a little.